Goto

Collaborating Authors

 ilya sutskever




Large Memory Layers with Product Keys

Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, Ludovic Denoyer, Herve Jegou

Neural Information Processing Systems

This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead.


Sequencer: Deep LSTMfor Image Classification

Neural Information Processing Systems

The modernize result, our Second, the connects Ontheother77], theoutput BiLSTM. Weadopt AdamWoptimizer [wingthepreviousstudy [weadopt ratebatchsizesfor Sequencer2D-S, Sequencer2D-M, are 2048, 1536, and 1024, respectively.


e97d1081481a4017df96b51be31001d3-Supplemental-Conference.pdf

Neural Information Processing Systems

Kinetics action classification.Our settings mainly follow[31,77]. The settings mainly follow [39, 77]. We report top-1 and top-5accuracyonthevalidation set. Entriesusingspatialresolution >2242 are noted in gray; entries using in-house data for supervision are in light blue. Importantly, our method is muchsimpler than many other entries.





IncorporatingBERTinto ParallelSequenceDecodingwithAdapters

Neural Information Processing Systems

While largescale pre-trained language models such asBERT[5]haveachieved greatsuccess onvariousnatural language understanding tasks,howtoefficiently and effectively incorporate them into sequence-to-sequence models and the corresponding text generation tasks remains a non-trivial problem.